Skip to content

Conversation

danlu1
Copy link
Contributor

@danlu1 danlu1 commented Oct 3, 2025

Problem:

At present, delete_rows accepts only a table query to provide ROW_ID and ROW_VERSION values for deletion. This will be extended to also accept a dataframe, replicating the deprecated function’s ability to specify rows through a row_id_vers_list.

Solution:

  1. Add a new param to delete_rows so rows can be deleted with a filtered dataframe.
  2. Tutorial has been updated to reflect this change.

Testing:

Unit test and integration test are added and they all passed.

@danlu1 danlu1 requested a review from a team as a code owner October 3, 2025 00:48
Copy link
Contributor

@jaymedina jaymedina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good so far! 🔥

I did an initial pass-through and was wondering: What happens when a user feed a dataframe with row_ids/versions/etags that don't exist in that table? what does the API spit back and is that error-handling enough?

@danlu1 danlu1 changed the title [GEN-1667] [GEN-1667] Delete Table Rows Using Filtered DataFrame Oct 3, 2025
@danlu1
Copy link
Contributor Author

danlu1 commented Oct 3, 2025

@jaymedina Thanks for the feedback, I will address them accordingly.

"The dataframe must contain the 'ROW_ID' and 'ROW_VERSION' columns."
)

if self.__class__.__name__ in CLASSES_THAT_CONTAIN_ROW_ETAG:
Copy link
Contributor Author

@danlu1 danlu1 Oct 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@BryanFauble When I add test case for the below ValueError, I got the AttributeError: 'Dataset' object has no attribute 'delete_rows_async' for my test case of Dataset. As this line, I believe we originally intent to use delete_row in the Dataset, EntityView, DatasetCollection, and SubmissionView but I didn't find the function in any of the models. Have we not had the chance to add this function yet?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So these other items are special @danlu1 . You can't actually delete rows of those Entity types, the reason is because there is a separate items attribute on the entity itself that dictates what should be present on the table:

https://rest-docs.synapse.org/rest/org/sagebionetworks/repo/model/table/Dataset.html

If you notice that:

class Dataset(
    DatasetSynchronousProtocol,
    AccessControllable,
    ViewBase,
    ViewStoreMixin,
    DeleteMixin,
    ColumnMixin,
    GetMixin,
    QueryMixin,
    ViewUpdateMixin,
    ViewSnapshotMixin,
):

Doesn't contain the TableDeleteRowMixin which gives this ability to the class

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for confirming!

@BryanFauble BryanFauble self-requested a review October 6, 2025 15:57
Copy link
Member

@BryanFauble BryanFauble left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No major complaints from me. Thanks for making these changes!

Copy link
Member

@thomasyu888 thomasyu888 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔥 awesome work and great review. I'll defer to Bryan for final review once this is done.

Copy link
Contributor

@linglp linglp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @danlu1 ! I tested the examples, and they worked for the most part. However, when I tried deleting a row with wrong version numbers, I expected an error but the request actually went through. Here's an example:

async def main():
    results = await query_async(query="SELECT * FROM syn63301937")
    print(results)

Here I saw:

Downloading files: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 576/576 [00:00<00:00, 1.38kB/s, syn63301937]
   ROW_ID  ROW_VERSION     Component                                           Filename                                    Id     entityId
0      12            6  MockFilename    schematic - main/MockFilenameComponent/txt3.txt  6b27d97b-0344-4499-8f08-8535ad72b802  syn61682653
1      13            6  MockFilename  schematic - main/MockFilenameComponent/this_fi...  7185f1ee-00bb-42d4-9051-1b61166fde0f  syn61682653
2      14            6  MockFilename    schematic - main/MockFilenameComponent/txt4.txt  9ce08bef-35a0-40d5-8b06-96c6071b1609   syn6168265
3      15            6  MockFilename    schematic - main/MockFilenameComponent/txt6.txt  07973efb-7d36-475a-b29f-0aa470c722b5          NaN

As you can see, the row version number is 6. But when I did:

df = pd.DataFrame({"ROW_ID": [10, 11], "ROW_VERSION": [100, 100]})

async def main():
    await Table(id="syn63301937").delete_rows_async(df=df)
asyncio.run(main())

I expected the request to fail because row version 100 doesn't exist. But the request appeared to go through:

(synapsePythonClient) lpeng@Mac synapsePythonClient % python3 /Users/lpeng/code/synapsePythonClient/test.py
Welcome, Lingling Peng! You are using the 'default' profile.
Found 2 rows to delete for given dataframe.
Uploading: 100%|█████████████████████████████████████████████████████████████████████████████████████████████| 33.0/33.0 [00:00<00:00, 68.8B/s, syn63301937_upload_636feaba-4b89-4d20-961c-034866078060.csv]
/entity/syn63301937/table/transaction/async: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1.00/1.00 [00:01<00:00, 1.21s/it]

@danlu1
Copy link
Contributor Author

danlu1 commented Oct 7, 2025

@linglp Thanks for testing the code. Can you do me a favor and confirm the actual row with wrong row_version were deleted? The "Found 2 rows to delete for given dataframe." printed before the row deletion so I'd like to learn the behavior of the code. Appreciate your help!

@danlu1
Copy link
Contributor Author

danlu1 commented Oct 10, 2025

@linglp Thanks for letting me test on your table. I confirm that
Original Table:
Screenshot 2025-10-10 at 12 26 35 PM

  1. If none of the ROW_ID and ROW_VERSION are in the table, no row would be deleted but we still get a progress bar for delete_row.
Screenshot 2025-10-10 at 12 30 55 PM 2. If the `ROW_ID` is in the table but none of `ROW_VERSION` matches, the row with correct row_id would be deleted. Screenshot 2025-10-10 at 12 49 44 PM 3. If no `ROW_ID` is in the table but `ROW_VERSION` matches, no row will be removed but we still get a progress bar for delete_row. Screenshot 2025-10-10 at 12 56 04 PM

It looks the ROW_ID is the key to delete rows and we don't want delete a row accidentally if row_version is not correct. I will add a check for ROW_ID and ROW_VERSION.

Copy link
Member

@thomasyu888 thomasyu888 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔥 Great work here, I took a quick glance and looked at all the tests - feel free to merge after your added check between ROW_ID and ROW_VERSION.

@danlu1 danlu1 requested review from linglp and thomasyu888 October 13, 2025 18:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants